Automatic transliteration of a record in a first language to a word in a second language

ABSTRACT

This specification describes an innovative method for automatic transliteration of a record in a first language to a word in a second language.

TECHNICAL FIELD

The present specification relates to a method for automatictransliteration of records in a first language to relevant words in asecond language.

BACKGROUND ART

Transliteration is used in many modern applications and particularly intext applications for mobile telephones, e.g., SMS (Short MessageService), and for network data retrieval, e.g., in Internet searchengines.

As a rule, transliteration, i.e. recording the words of one language bymeans of characters of the other language, is carried out with use oftables of correspondence between letters or more often between sounds,sometimes between syllables in a first language and their representationby letters of a second language. Such methods are described, for examplein the U.S. Pat. No. 6,460,015 (published on Oct. 1, 2002), U.S. Pat.No. 6,810,374 (published on Oct. 26, 2004), U.S. Pat. No. 7,376,648(published on May 20, 2008), in US Patent Applications 2005/0182616(published on Aug. 18, 2005), 2005/0216253 (published on Sep. 29, 2005),2006/0143207 (published on Jun. 29, 2006), 2007/0288230 (published onDec. 13, 2007), 2008/0097745 (published on Apr. 24, 2008), as well as inInternational Patent Application WO 01/20435 (published on Mar. 22,2001). A disadvantage of all such methods is failure to unambiguouslytransliterate the same combinations of letters or sounds in variouswords.

Chinese Patent 1193780 (published on Sep. 23, 1998) describes a“multi-purpose” method of transliteration. Such multi-purpose natureconsists in the fact that initially any language would be translatedinto Esperanto, and afterwards transliterated with Latin letters. It isclear that the extra stage of representing the text in Esperanto due toobligatory participation of a human would inevitably lengthen thetransliteration process, making it more expensive, and introduce extraerrors.

The US Patent Application 2008/0270111 (published on Oct. 30, 2008)describes a transliteration method, wherein each vowel and consonant hasunique representation in Unicode. In this method various phonetic andpseudo-phonetic transliteration variants are embodied, and generatedwords with preset information about them are grouped. The resultantvariants are analyzed with due consideration for such information,applying preset transliteration rules. A disadvantage of this methodlies in that letters are uniquely represented in code equivalent, sincevarious combinations of letters may sound differently, which is nottaken into account.

Canadian Patent Application 2630949 (published on Nov. 21, 2008)describes a transliteration method that involves an attempt of replacingchains of letters in a first language with chains of letters in a secondlanguage, determination of the replacement probability, and thenselection of the most probable replacement based on a preset criterion.A similar method is used in the laid-open Japanese Patent Application2005-092682 (published on Apr. 07, 2005). However, both methods arerigidly linked to the pair of languages considered in each method(English and Arabic or English and Japanese, respectively) that arepoorly flectional. Therefore, they are hardly adaptable to other cases.

SUMMARY

This specification describes an innovative method for automatictransliteration of a record in a first language to a word in a secondlanguage. This method includes the following: receiving a sequence ofsignals, each of which encodes the corresponding character in a firstlanguage record to be transliterated, and saving the received signalsequence to a memory;

during stepwise analysis of the signals from the start to the end of thesaved signal sequence, finding all transliteration rules for thecharacter being analyzed at the given step or character group startingwith the character being analyzed at the given step, in the firstlanguage record to be transliterated, by looking up in an appropriaterule base created in advance;

complementing all the transliteration variants obtained at the previousstage with characters of the second language from each transliterationrule found at the given step, thus obtaining at the given step compositevariants of transliteration associated with this step or with one of thesubsequent steps;

at each step, comparing each of the composite transliteration variantsobtained at the given step against starting segments of words in thesecond language, by looking up in an appropriate base of startingsegments of words in the second language, and if a particulartransliteration variant associated with the given step matches astarting segment of a word in the second language, retaining thistransliteration variant for analysis in subsequent steps;

on completion of the last step of the analysis, accepting at least onetransliteration variant in the second language associated with the laststep as the transliteration result of the record in the first language;and

creating a sequence of signals, each of which encodes the correspondingcharacter in the resulting transliterated word of the second language.

An optional feature of the method is that the received signal sequenceis complemented at the start and end with signals corresponding topreset characters, and the first step of analysis is started from thefirst of the preset characters. The preset characters can be alphabeticor non-alphabetic, and they may be different characters or the very samepreset character.

Another optional feature of the method is that the characters used forthe record in the first language are selected from the group made up of:alphabetic characters of the first language; non-alphabetic characters,each resembling some alphabetic character of the second language; andnon-alphabetic characters that in combination resemble some alphabeticcharacter of the second language.

One more optional feature of the method is that the base of startingsegments of words in the second language is created directly during thecomparison, from chains no longer than m characters, which chains areobtained from actual words of the second language, and the startingsegments of words in the second language are created by imposition ofthe chains with a shift by one character, wherein m>1 is an integerpre-selected for the second language; during this process, theoccurrence frequency in the second language is determined for each ofthe chains, and the chains with occurrence frequency below a presetlimit are removed.

In the latter case, for each particular transliteration variantassociated with the given step and no more than m characters long, itsprobability is determined by finding the occurrence frequency of thecorresponding chain, while for each particular variant that is longerthan m characters and is associated with the given step, its probabilityis determined by multiplying the occurrence frequencies of thoseinvolved chains m characters long which belong to the given particulartransliteration variant with overlapping upon shifting each subsequentchain m characters long by one character from the start of the givenparticular transliteration variant; if any of the chains is missing, thevariant is discarded.

At the same time, during determination of the probabilities oftransliteration variants, the probability calculated at a particularstep of analysis is multiplied by a pre-selected weighting factor, ofthe rule which is used at the given step of analysis to find thetransliteration variant

One more optional feature of the method is that when the base withstarting segments of words in the second language is complementeddirectly during the comparison, if no variant of transliteration in thesecond language is found upon completing the analysis for the record inthe first language being analyzed, the stepwise analysis of signals fromthe start to the end of the saved signal sequence can be repeated, andif this particular transliteration variant associated with the givenstep does not match any starting segment of some word in the secondlanguage, this transliteration variant is assigned a tentativeoccurrence frequency equal to a small preset value, at the same timetransliteration variants having higher probability are saved.

Moreover, under the same situation, the number of transliterationvariants that can be saved at each step of the analysis shall not exceeda preset number of the transliteration variants ending with the same“m−1” characters of the second language and having higher probability.

Another optional feature of the method is that in the case that thetransliteration variant having k chains is compared to a transliterationvariant having q chains, where k and q are positive integers, theprobabilities of both transliteration variants can be normalized by kand q, respectively, before the comparison, by extracting the k-th orq-th root, respectively, from the corresponding probability.

One more optional feature of the method is that beside the number ofchains, the number of rules used in each of the compared variants canalso be taken into account.

And at last, one more optional feature of the method is that alogarithmic measure of probability can be used as the probability of atransliteration variant.

The method can be implemented as first described above or with anynumber of the optional features described above. The steps of the methodand all combinations of the optional features, can also be implementedcorresponding systems, apparatus, and computer programs recorded oncomputer storage devices, each configured to perform the steps of themethod and any implemented optional features.

DETAILED DESCRIPTION

This further description describes a detailed exemplary embodiment ofthe present invention method illustrated by an example oftransliterating records with English (Latin) letters to records inRussian (Cyrillic letters). However, the example merely depicts theembodiment of the present invention method, and under no circumstance itshall be treated as a limitation for the method, the scope of which isdetermined only by the attached formula of invention.

The method actually begins with a preparatory stage, where any public orother databases are used to collect statistics of ways to write certaincombinations of letters in one (first) language with letters of another(second) language. For the purpose of transliteration it is necessary tohave two sets of data:

(1) Transliteration rules, i.e. a set of rules describing potentialtransliteration process. For example,

sh→

. The rules need not be one-for-one (e.g.

s→c

,

c→c

), and can even be overlapping (e.g.

s→c

,

ch→

, but

sch→

). One of the ways for taking overlaps into account involves assigningweighting coefficients to each rule. So, a transliteration rulerepresents a set that includes:

-   -   a combination, i.e. a set of Latin letters denoting a piece of a        record, that, once found in the record, may be replaced        according to such a rule;    -   contents, i.e. a chain of letters of the target language (in our        examples—Russian), into which the combination is to be        transformed; and    -   weight, i.e. a rational number identifying significance of the        rule among the other rules.

(2) statistics of variants, i.e. statistical information collected fromvarious data bases and used to identify the best candidate—the one weconsider the best among others—of the given transliteration in thetarget language.

For the purpose of this description, “record” means the input oftransliteration. It is a word of target language written with thealphabet of the first language. For example, we transliterate “mir”->“

”. “mir” is the (input) record, “

” is its transliteration, the word of the target language (peace).

From the general point of view, everything seems simple. There is a setof rules (in the case under consideration—a set of presentations fromthe sequence of Latin letters to the sequence of Russian letters) thatidentifies the transliteration. Then, it is “only” required to applyeach potential rule to each potential sequence to obtain a number ofpotential transliteration variants. However, this is not practical: forexample, a record of the Russian word

zashtsheeshtshayoushtsheekhsya

provides 34 million transliteration variants.

As transliteration is generally ambiguous, a record could betransliterated in many inconsistent ways. For example, Russian isabundant in groups of obviously inconsistent rules:

-   -   ‘y’→‘        ’, ‘y’→‘        ’, ‘y’→‘        ’, ‘y’→‘        ’;    -   ‘ch’→‘        ’, →‘sch’→‘        ’;    -   . . .

Thus, the record

schy

corresponding to word

has at least 2×4=8 transliteration variants. Therefore, a certain methodof selecting a preferable variant is requited. Let us specify function μfor all such spelling variants, and compare μ(T_(i)) and μ(T_(j)), whereT_(i) and T_(j) represent comparable spelling variants (i≠j). One mightassume that, if T is a dictionary word or the start of certaindictionary word, then μ(T)=1, otherwise, μ(T)=0. However, any dictionarycontains a limited number of words (by which we mean combinations ofletters) used in the given language. The method is designed even formistakenly written words. Therefore, a different approach will be used.

Let us select a certain integer number m, and introduce the followingdefinition:

μ(T)=μ_(T), if length of T is less than or equal to m;

μ(T)=μ(T_([1 . . . m]))×μ(T_([2 . . . m+1]))×. . .×μ(T_([k . . . m+k−1])), where length of T is equal to m+k−1, where thedesignation T_([a . . . b]) represents subsequence T in a word fromposition a to position b inclusive.

Thus, it is required to define all constants μ_(T) for completedefinition of μ(T) for any T. Let us assume that μ_(T) (for anysubsequence T being as long as ≦m) represents a number of subsequences Tfound as a subsequence of the target language words in the targetlanguage analyzed texts taken from a certain source, for example, fromInternet web pages.

Quantitative values (μ_(T)) can always be transformed into a probabilityby dividing each such value by total sum. In this case, function μ isdetermined in all variants T of the target language, and expressessimilarity degree of T to words from the analyzed source. It can be usedas a probability measure of the target language variants.

An m-long piece of the target language word will be referred to as anm-chain (or just a chain). The term m-chain will also be used to referto the start of a word that is less than m characters long, to be ableto identify the probability of a variant having failed to reach a lengthof m characters. A suitable value for m depends on the language and canbe determined heuristically, for example. On the one hand, m should be asmall number because total number of chains to be saved in the memorygrows like a power m. On the other hand, given too small m, the wordswith frequently occurring chains can prove preferable to the correctones. For example, let the correct variant of a record transliterationbe

abcdef

for a language with m=3, though, if, for instance, μ[def]<μ[del], thenvariant

abcdel

will prove preferable.

For Russian, it works well to take m equal to 6. Let us note that whenusing text from the Internet, not every chain is used. To cut out rarewords and obtain more consistent data, a minimum frequency or occurrencethreshold is applied.

In comparing two spelling variants and picking the best one, one cannotsimply compare probabilities (μ) if such variants differ in length,since the shorter variant has higher probability due to the lower numberof multipliers in the product. A special function is used forcalculating difference between such lengths and for multiplyingprobabilities of the shorter variant by mean probability used increating variants to be compared. Thereby, a normalization of thevariants are attained as if their lengths are equal.

For further description, let us assume that each record in the source(first) language is provided at the start and at the end with certainpreset characters. The method does not require such additionalcharacters, but they are fairly helpful for illustrating its embodiment.These characters can be either similar or different, either alphabeticalor non-alphabetical. The next illustrative example uses character

at the start of the record and character

at the end of the word. Then, for example, 5-chain [̂abcd] denotessequence of letters at the start of the word, while 5-chain [̂who$]denotes individual word

who

.

Let us introduce another definition: end of T transliteration variant isrepresented by the last m−1 letters in T, where m is the size of chainsused for this language. Let us also point out that, if variant T isshorter than m letters, its end will be represented by variant T itself.For example, when 6-chains are used for Russian, then the end forvariant T=

̂Me

po

o

will be represented by

po

o].

It should be noted that hereinafter records of the words in the firstlanguage are termed for the sake of simplicity as words, although in thestrict sense they do not constitute words.

The following result—presented as a theorem—will be referred to later inthis specification.

Theorem. Let us assume that there are statistics of a language withprobability measure μ on the basis of m-chains. Let there be a record Was long as n, and its best (according to μ) transliteration variantT=T(W) as long as k.

For any positive integer t (t≦n), let us consider W_([1 . . . t])—theinitial segment of record W until position t inclusive. T_([1 . . . s])is the part of T corresponding to transliteration W_([1 . . . t]).

Then T_([1 . . . s]) is the better (according to μ) variant among allother variants with the same end of transliteration W_([1 . . . t]).

Example. For Russian language (m=6) and record W=

̂zashtsheeshtshayoushtsheekhsya$

the best variant T=

̂3a

a

. For t=24, W_([1 . . . 24])=

̂zashtsheeshtshayoushtsh

, and corresponding to it there is sub-variant (s=9) T_([1 . . . 9])=

̂3a

. Then T_([1 . . . 9]) is the best variant among all other variants withthe same end

of transliteration W_([1 . . . 24]).

Proof. Let us assume that for W_([1 . . . t ]) there is another variantT′_([1 . . . s′]) with the same end, andμ(T′_([1 . . . s′]))>μ(T_([1 . . . s])). That is, . variantT′_([1 . . . s′]) is better than T_([1 . . . s]).

Let us develop T′ as concatenation T′_([1 . . . s′])+T_([s+1 . . . k]).Then, T′ will also constitute a variant of transliteration W as a fusionof word W transliteration before position t and after it.

Then, considering that T_([1 . . . s]) and T′_([1 . . . s′]) underhypothesis have the same ends, we obtain:

μ(T)=μ(T _([1 . . . s]))×μ(T _([s−m+1 . . . s+1]))×μ(T_([s−m+2 . . . s+2]))×. . . ×μ(T _([s . . . s+m]))×μ(T_([s+1 . . . k])),

μ(T′)=μ(T′ _([1 . . . s′])×μ() T _([s−m+1 . . . s+1]))×μ(T_([s−m+2 . . . s+2]))×. . . ×μ(T _([s . . . s+m]))×μ(T _([s+1 . . . k]))

Note that if s<m, s′<m or s+m>k, certain multipliers will disappear.

Since, by hypothesis, the first element of the product for μ(T′) islarger than the first element for μ(T), and the rest of them are equal,then μ(T′)>μ(T). We have a contradiction with the hypothesis that T isthe best variant for W.

It should be noted that this result is true even if all the words areshort (having length less than m).

Thus, the algorithm alone as embodied by the present method is fairlysimple. As an option in the illustrative example, the record in thefirst language has signs

̂

and

$

added to indicate the start and the end of this record. Then, the recordwill undergo stage-by-stage analysis from the first to the last letter(including the above added signs). In reviewing the position of eachsubsequent letter, all suitable rules will be applied starting from thecurrent position, which results in a number of transliteration variantsat the given stage of the analysis. Then, the rules will be applied topreviously obtained variants, and eventually the variants of the entireword will be obtained.

At each stage, only variants with unique ends will be retained. If somevariants have the same ends, the most probable variant will be retained.(In the event that M of the most probable resultant variants arerequired, then the largest M of the best variants with the same endshall, be retained). We are entitled to this according to the maintheorem inference saying that the best variants will arise out of thebest sub-variants (among those having the same ends).

Let us point out that sub-variants having different ends are notcompared with one another. Let us assume that variant

abc

is better than variant

aBC

, but the application of the following rule may make variant

abcd

inferior to variant

aBCd

. Therefore, if the ends are different, we retain all the variants inthe hope that some of them can give the best estimate.

Retaining only the best variants at each stage will make our working setas small as required, and shall refrain us from iteration at eachpossible transliteration.

Each time, when we apply a certain rule to a certain variant, we updateits probability according to the chains that we added, and multiply itby weight of the rule.

Example. Let m=6 for Russian; and let the input record W=

̂metropol$

.

Assume that after transliteration

̂metropo

, we obtain variant {̂

e

po

o}. One of the rules that corresponds to the last letter

l

is rule R:

l

→

. Then, we add {

}, thus obtaining

̂Me

po

o

, and have to recalculate its frequency multiplying it by μ[

po

o

]×μ[po

o

arib]×weight (R).

Finally, we use the most frequent variant or variants that we obtainedat the final stage, and remove signs

and

.

Now, let us consider implementation of this algorithm in the presentmethod.

As this method is designed for automatic transliteration, all therecords to be transliterated are made suitable for computer processing.Namely, the records are coded in any acceptable code (e.g. ASCII, KOI-8,Unicode etc.) and transformed into appropriate signals to be transmittedthrough communication lines or recorded to a machine-readable medium. Asequence of such signals, with the help of which the first languagerecord to be transliterated was coded as a sequence of characters, willbe received for its further processing in a duly programmed dataprocessing unit. The latter can be a personal computer (PC), personaldigital assistant (PDA), mobile telephone, server or any other devicethat can be provided with a program written according to a method statedin the present description. This received sequence of signals willoptionally be supplemented at its start and end with signalscorresponding to preset characters. As already noted above, thecharacters can be similar or different, alphabetic or non-alphabetic. Inthe latter case, for example, all characters of a record to betransliterated can be lower-case characters, while the start and end ofsuch word can be supplemented with capital letters. The method will beperformed on this data.

To do this, a first character will initially be selected, and itsappropriate transliteration rule will be searched. Normally, there arerules that transliterate characters designating start and end of arecord respectively into themselves. However, a rule can be introducedthat will be applicable to only the record character, should it rankfirst. For example, ̂

→̂

. The last characters will be treated similarly, e.g. yov$ →eB$. Then,the second character (i.e. the first letter of the record) will bechosen, and transliteration rules corresponding to it will be searched.Each found rule, i.e. corresponding character or a chain ofcorresponding characters (if a character of the source language can betransliterated by means of, say, two characters of the targetlanguage—e.g.

l

→

) will be used to supplement each transliteration rule found in theprevious iteration. In the given case, a non-alphabetic character,introduced at the record start and being transliterated into itself,will be supplemented with characters of the second language from eachtransliteration rule found at this stage for the first character of thesource language. In principle, a record in the first language can usecharacters selected from a group made up of, for example, the following:alphabetic characters of the first language; non-alphabetic characters,each of which bears resemblance to one of alphabetic characters in thesecond language (for example, figure

can be used to record Russian letter

instead of Latin letter

); non-alphabetic characters, the combination of which bears resemblanceto one of alphabetic characters in the second language (for example, apair of characters

(forward and backward slash characters) could be used to record Russianletter

instead of Latin letter

).

The next stage of the analysis involves picking up the second letter ofthe record being transliterated (i.e. the third character, should suchrecord start and end be supplemented with non-alphabetic characters),and searching transliteration rules for such second letter. All thusfound characters of the second language will supplement the chainspreviously made up of the second language characters, i.e. the chains ofthe record starting non-alphabetic character supplemented with thesecond language appropriate characters found for the first letter of therecord in the source (first) language.

Let us also point out that the rules, are applicable not only toindividual characters of the record, but to groups of charactersstarting with the analyzed character. In this case, engendered variantsfall into a stage preceding another stage that will involveconsideration of a record character following those transformed by therule.

These analysis stages are repeated until no more characters of therecord being transliterated remain, i.e. in the present case, to anon-alphabetic character added to the end of such record. At each stageof analysis, each chain already made up at the previous stage of thesecond (target) language characters will be supplemented with charactersfound at the present stage. Thereafter, each of the resulting chains oftarget language characters are compared by reference to the appropriatebase of initial segments with initial segments of the second languagewords. Such a comparison process involves picking-up, at each stage, thevariants having higher occurrence frequency as compared with othervariants having the same ends out of many transliteration variants foundin the course of the above comparison:

1. We parse the input record from the beginning to the endcharacter-by-character.

2. At each stage, we find all rules matching the input record from thecurrent character and on.

3. All chains already made-up at previous stage are supplemented withthe contents of the found rules.

4. By referring to the base of initial segments of the second languagewords, we assign the probability of the variant. If no initial segmentequal to the variant found, the variant is dropped.

5. The variants with same endings (same last m−1 characters) and on thesame stage, are compared by their probability. The higher probabilityvariant is saved, others dropped.

In principle, the base of the second language initial segments can bemade up beforehand using a data collection approximating all words inthe second language. However, this would require a very voluminousdatabase. Therefore, it is more efficient to follow another way, and toform this base of initial segments of the second language words just inthe course of the above comparison with initial segments of real wordsin the second language. Such a base of initial segments of the secondlanguage words will be compiled from real words in the second language,while initial segments of the second language words will be made up bymapping such chains with a single-character shift. Probability of eachspecific transliteration variant associated with the above stage and notexceeding m characters in length will represent occurrence frequency ofan appropriate chain, while the probability of each specifictransliteration variant associated with the above stage and exceeding mcharacters in length will be found through multiplying occurrencefrequencies of m-character long chains contained in the above-specifiedtransliteration variant with overlapping due to shift of each nextm-character long chains by one character from the start of the abovespecific transliteration variant. For example, for variant

, the previously found occurrence frequencies for chains

,

and

will be multiplied. The resultant product will constitute the targetprobability for variant

. Each of the above chains having occurrence frequency in the secondlanguage lower then a preset limit can initially be removed from thechains' database; and if a certain chain is absent, the correspondingtransliteration variant may be discarded.

In certain cases, it seems expedient, whilst calculating theprobabilities of transliteration variants, to multiply the probabilitycalculated at a specific stage of analysis by a previously selectedweight coefficient assigned to the rule used at the above analysis stageto find transliteration variant. This helps to estimate variants basednot only on its characters, but also on rules that generated thevariant. It is useful in some cases, when exclusion of the abovevariants may result in failure to find transliteration for the entirerecord in the source language, since without assigning such a smallprobability, transliteration results obtained in the previous iterationsmay be discarded.

In the case when, on completion of the last stage of analysis, notransliteration variant in the second language for the analyzed recordin the first language has been found, the above stage-by-stage analysisof signals will be repeated from the start to the end of the storedsignal sequence, and where there is no coincidence of the abovestage-specific transliteration variant with any initial segment of theword in the second language, such transliteration variant will beassigned fake occurrence frequency with preset low value. In this case,retained at each analysis stage will be no more than a preset number oftransliteration variants having the same end of m−1 characters of thesecond language, and a frequency bring higher then the rest of thevariants with the same ends of m−1 characters. This would be helpful forthe transliteration into such languages as Arabic, where the same letterappearance can vary depending on its position at the start, middle orend of a word. If the target language has no such peculiarities, themost probable variant will be chosen from transliteration variants withsimilar latest m−1 characters of the target language.

In the case of comparing a transliteration variant consisting of kaforementioned chains and rules with a transliteration variantconsisting of q aforementioned chains and rules, where k and q arepositive integers, the probabilities of both transliteration variantswill be normalized prior to comparison in terms of k or q, respectively,by extraction of k-th or q-th root, respectively, of the appropriateprobability values.

For example, with initial record

̂shkola$

at the third stage, we have variant V₁=

consisting of a single chain and three rules: ̂→̂, s→c, h→x, and variantV₂=

consisting of a single chain and two rules: ̂→̂, sh→

. In this case, not the first root, i.e. not the probability valuesthemselves (comparison frequency) shall be compared, but respectively,the fourth root of the occurrence frequency of variant V₁ and the thirdroot of the occurrence frequency of variant V₂.

One may point out that it is expedient to use the logarithmic measure offrequency of occurrence of chains of the second language characters as aprobability of a fairly small value due to huge numbers of words reliedupon in creating the above base of rules and base of initial segments.

When the processing of the entire record in the source language to betransliterated is completed through multiple stages as described above,the final most probable result can be treated as a targettransliteration in the target language. This is followed with generationof a signals sequence corresponding to the second languagetransliterated word encoding that, for example, can be transmitted overcommunication lines or written in a respective memory.

The transliteration examples in accordance with the method describedabove are shown below. These examples represent the variants attributedto each transliteration stage and left (not discarded) for analysis atsubsequent stages.

EXAMPLE I

Transliteration of record

olga

.

End Probability Transliterated Applied rules part (with weights)1: ̂lolga$

[{circumflex over ( )}] 1 “{circumflex over ( )}”  → At the first stage, initial signal was transliterated into itself.

2: ̂ollga$

[{circumflex over ( )}o] 0.0198769 “{circumflex over ( )}o” {circumflexover ( )}→{circumflex over ( )} o→o3: ̂ollga$

[{circumflex over ( )}o 

 ] 0.000146119

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

[{circumflex over ( )}o 

 ] 1.56053e−06

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

 w: 0.05At the third stage, the second letter of the record was transliteratedin two ways giving rise to two variants. Since their ends differ,according to the theorem, no attempt at comparing and discarding hasbeen made.

4: ̂olgla$

[{circumflex over ( )}o 

 ] 4.04846e−07

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

g→ 

[{circumflex over ( )}o 

 ] 1.74308e−09

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

g→ 

 w: 0.01 [{circumflex over ( )}o  ] 1.33855e−06

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

  w: 0.05 g→ 

At the fourth stage, next letter g, once also given two-foldtransliteration, has engendered four variants. But, as chain [̂o

] has zero probability, its variant has been discarded.5: ̂olgal$

[{circumflex over ( )}o 

 a] 9.65577e−08

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

g→ 

a→a [{circumflex over ( )}o 

 a]  3.8423e−10

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

g→ 

  w: 0.01 a→a [o 

 a] 1.11676e−06

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

  w: 0.05 g→ 

a→a6: ̂olga$

[ 

 a$] 2.55749e−11

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

  w: 0.05 g→ 

a→a $→$ [o 

 a$]  3.4578e−08

{circumflex over ( )}→{circumflex over ( )} o→o l→ 

g→ 

a→a $→$

Result:

EXAMPLE II

Transliteration of record

shashka

.This example illustrates application of rules assigned to higher-orderstages than those, at which it was applied.For example, the first character s and subsequent characters wereanalyzed at the second stage. One of the rules, sh→

, affected the record second and third positions at once. Therefore,variant

engendered by them has been referred at once to the third stage.1: ̂lshashka$

[{circumflex over ( )}] 1 {circumflex over ( )} {circumflex over( )}→{circumflex over ( )}2: ̂slhashka$

[{circumflex over ( )} 

 ] 8.10924e−05 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} s→ 

  w: 0.01 [{circumflex over ( )}c] 0.0309413 {circumflex over ( )}c{circumflex over ( )}→{circumflex over ( )} s→c [{circumflex over ( )}C 

 ] 5.14555e−05 {circumflex over ( )}C 

  {circumflex over ( )}→{circumflex over ( )} s→C 

[{circumflex over ( )}C 

 ] 4.51423e−07 {circumflex over ( )}C 

  {circumflex over ( )}→{circumflex over ( )} s→C 

  w: 0.1 [{circumflex over ( )} 

 ] 1.80712e−06 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} s→ 

  w: 0.0013: ̂shlashka$

[{circumflex over ( )} 

 x] 7.64281e−09 {circumflex over ( )} 

 x {circumflex over ( )}→{circumflex over ( )} s→ 

  w: 0.01 h→x [{circumflex over ( )}cx] 9.61574e−05 {circumflex over( )}cx {circumflex over ( )}→{circumflex over ( )} s→c h→x [{circumflexover ( )} 

 ] 0.00180712 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

[{circumflex over ( )} 

 x] 3.17676e−09 {circumflex over ( )} 

 x {circumflex over ( )}→{circumflex over ( )} s→ 

  w: 0.001 h→x [{circumflex over ( )} 

 ]  5.4019e−06 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.014: ̂shalshka$

[{circumflex over ( )} 

 xa] 3.72714e−10 {circumflex over ( )} 

 xa {circumflex over ( )}→{circumflex over ( )} s→ 

  w: 0.01 h→x a→a [{circumflex over ( )}cxa] 9.58236e−08 {circumflexover ( )}cxa {circumflex over ( )}→{circumflex over ( )} s→c h→x a→a[{circumflex over ( )} 

 a] 0.000238168 {circumflex over ( )} 

 a {circumflex over ( )}→{circumflex over ( )} sh→ 

a→a [{circumflex over ( )} 

 xa] 1.97438e−11 {circumflex over ( )} 

 xa {circumflex over ( )}→{circumflex over ( )} s→ 

  w: 0.001 h→x a→a [{circumflex over ( )} 

 ] 2.59001e−09 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

a→ 

  w: 0.001 [{circumflex over ( )} 

 a] 1.10777e−07 {circumflex over ( )} 

 a {circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→a [{circumflex over ( )} 

 ] 2.93584e−11 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→ 

  w: 0.0015: ̂shaslhka$

[{circumflex over ( )} 

 ac] 4.24794e−06 {circumflex over ( )} 

 ac {circumflex over ( )}→{circumflex over ( )} sh→ 

a→a s→c [{circumflex over ( )} 

 ] 6.59954e−09 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

a→a s→ 

  w: 0.001 [{circumflex over ( )} 

 c] 2.01154e−11 {circumflex over ( )} 

 c {circumflex over ( )}→{circumflex over ( )} sh→ 

a→ 

  w: 0.001 s→c [{circumflex over ( )} 

 ]  6.9434e−13 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

a→ 

  w: 0.001 s→ 

  w: 0.001 [{circumflex over ( )} 

 ] 2.41509e−11 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→a s→ 

  w: 0.01 [{circumflex over ( )} 

 ac] 4.47293e−08 {circumflex over ( )} 

 ac {circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→a s→c [{circumflex over ( )} 

 ] 1.63144e−12 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→a s→ 

  w: 0.001 [{circumflex over ( )} 

] 1.94088e−15 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→ 

  w: 0.001 s→3 w: 0.01 [{circumflex over ( )} 

 c] 7.63459e−13 {circumflex over ( )} 

 c {circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→ 

  w: 0.001 s→c [{circumflex over ( )} 

 ] 1.06198e−15 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→ 

  w: 0.001 s→ 

  w: 0.0016: ̂shashlka$

[{circumflex over ( )} 

 ] 6.59954e−06 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

a→a sh→ 

[{circumflex over ( )} 

 ] 2.84768e−09 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

a→a sh→ 

  w: 0.01 [{circumflex over ( )} 

 ]  6.9434e−10 {circumflex over ( )}  {circumflex over ( )}→{circumflexover ( )} sh→ 

a→ 

  w: 0.001 sh→ 

[{circumflex over ( )} 

 ] 4.95191e−12 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

a→ 

  w: 0.001 sh→ 

  w: 0.01 [{circumflex over ( )} 

 ] 1.63144e−09 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→a sh→ 

[{circumflex over ( )} 

 ] 8.60848e−11 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→a sh→ 

  w: 0.01 [{circumflex over ( )} 

 ] 1.06198e−12 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→ 

  w: 0.001 sh→ 

[{circumflex over ( )} 

 ] 8.05361e−14 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} sh→ 

  w: 0.01 a→ 

  w: 0.001 s→ 

  w: 0.017: ̂shashkla$

[{circumflex over ( )} 

 κ] 2.52467e−06 {circumflex over ( )} 

 κ {circumflex over ( )}→{circumflex over ( )} sh→ 

a→a sh→ 

k→κ8: ̂shashkal$

[{circumflex over ( )} 

 κa] 5.38716e−07 {circumflex over ( )}  κa {circumflex over( )}→{circumflex over ( )} sh→ 

a→a sh→ 

k→κ a→a9: ̂shashka$l

[a 

 κa$] 1.03159e−13 {circumflex over ( )} 

 κa$ {circumflex over ( )}→{circumflex over ( )} sh→ 

a→a sh→ 

k→κ aa $→$

Result: EXAMPLE III

Transliteration of record

odzol

.This example illustrates repeated transliteration process due to theabsence of a result following the first passage. Indeed, the probabilityof chain [

] is zero, and expected result

was discarded at the last stage. The other variants were also discardedat various stages for the same reason.The second time, the variants that failed to coincide with any initialsegments of words in the second language, were assigned low probability1e-10. Since no variants have been discarded in the transliterationprocess, their total number is large, but the result was achieved.1: ̂lpodzol$

Transliterated Applied rules End Probability part (with weights)[{circumflex over ( )}] 1 {circumflex over ( )} {circumflex over( )}→{circumflex over ( )}2: ̂plodzol$

[{circumflex over ( )} 

 ] 0.0356523 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

[{circumflex over ( )}p] 1.40238e−07 {circumflex over ( )}p {circumflexover ( )}→{circumflex over ( )} p→p(1e−05)3: ̂poldzol$

[{circumflex over ( )} 

 o] 0.014841 {circumflex over ( )} 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o [{circumflex over ( )}po] 1.82471e−08 {circumflex over ( )}po{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o4: ̂podlzol$

[{circumflex over ( )} 

 ] 0.00224693 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

[{circumflex over ( )} 

 ] 5.03591e−06 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) [{circumflex over ( )} 

 ] 8.71406e−08 {circumflex over ( )} 

p→ 

o→o d→ 

 (0.2) [{circumflex over ( )}po 

 ] 2.09531e−09 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

[{circumflex over ( )} 

 ] 4.23415e−14 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

 (0.2) [{circumflex over ( )} 

 ]     2e−16 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

 (0.2)5: ̂podzlol$

[{circumflex over ( )} 

 ] 1.52488e−07 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.03) [{circumflex over ( )} 

 ] 1.50648e−05 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

[{circumflex over ( )} 

 ] 2.20269e−08 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.05) [{circumflex over ( )}po 

 ] 5.94019e−13 {circumflex over ( )}po 

  {circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

z→ 

 (0.03) [{circumflex over ( )}po 

 ] 9.18225e−13 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

z→ 

[{circumflex over ( )}po 

 ]     5e−17 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

z→ 

 (0.05) [o 

 ]     4e−22 {circumflex over ( )} 

 o 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0,2) [o 

 ]     4e−22 {circumflex over ( )} 

 o 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.2) [ 

 o 

 ]     2e−11 {circumflex over ( )} 

 o 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.2) [ 

 o 

 ]     6e−13 {circumflex over ( )} 

 o 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) [ 

 o 

 ]     2e−11 {circumflex over ( )} 

 o 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

[ 

 o 

 ]     1e−12 {circumflex over ( )} 

 o 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) [ 

 o 

 ]     6e−13 {circumflex over ( )} 

 o 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) [ 

 o 

 ]     2e−11 {circumflex over ( )} 

 o 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

[ 

 o 

 ]     1e−12 {circumflex over ( )} 

 o 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) [po 

 ]     2e−16 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

z→ 

 (0,2) [po 

 ]     6e−18 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

 (0.2) z→ 

 (0.03) [po 

 ]     2e−16 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

 (0.2) z→ 

[po 

 ]     1e−17 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

 (0.2) z→ 

 (0.05) [po 

 ]     6e−18 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

 (0.2) z→ 

 (0.03) [po 

 ]     2e−16 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

 (0.2) z→ 

[po 

 ]     1e−17 {circumflex over ( )}po 

{circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

 (0.2) z→ 

 (0.05)6: ̂podzoll$

[ 

 o]     4e−32 {circumflex over ( )} 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.2) o→o [ 

 o]     4e−32 {circumflex over ( )} 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.2) o→o [o 

 o]     2e−21 {circumflex over ( )} 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.2) o→o [o 

 o]     6e−23 {circumflex over ( )} 

 o 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) o→o [o 

 o]     2e−21 {circumflex over ( )} 

 o 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

o→o [o 

 o]     1e−22 {circumflex over ( )} 

 o 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) o→o [o 

 o]     6e−23 {circumflex over ( )} 

 o 

 o p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) o→o [o 

 o] 2e−21 {circumflex over ( )} 

 o 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

o→o [o 

 o]     1e−22 {circumflex over ( )} 

 o 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) o→o [ 

 o] 2.73535e−08 {circumflex over ( )} 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.03) o→o [ 

 o] 7.79656e−07 {circumflex over ( )} 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

o→o [ 

 o]     5e−12 {circumflex over ( )} 

 o {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.05) o→o [po 

 o]     3e−17 {circumflex over ( )}po 

 o {circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

z→ 

 (0.03) o→o [po 

 o]     1e−15 {circumflex over ( )}po 

 o {circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

z→ 

o→o [po 

 o]     5e−17 {circumflex over ( )}po 

 o {circumflex over ( )}→{circumflex over ( )} p→p(1e−05) o→o d→ 

z→ 

 (0.05) o→o7: ̂podzoll$

[ 

 ] 1.36767e−29 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.03) o→o l→ 

 (0.05) [ 

 ] 3.11498e−25 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

o→o l→ 

 (0.05) [ 

 ]     2e−31 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.2) o→o l→ 

[ 

 ]   2.5e−33 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.05) o→o l→ 

 (0.05) [ 

 ]     6e−33 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) o→o l→ 

[ 

 ]     2e−31 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

o→o l→ 

[ 

 ]     1e−32 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) o→o l→ 

[ 

 ]     6e−33 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) o→o l→ 

[ 

 ]     2e−31 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

o→o l→ 

[ 

 ]     1e−32 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) o→o l→ 

[ 

 ]     1e−42 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.2) o→o l→ 

 (0.05) [o 

 ] 2.73535e−18 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.03) o→o l→ 

[o 

 ] 6.22996e−14 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

o→o l→ 

[o 

 ]     5e−22 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.05) o→o l→ 

[ 

 ]     3e−44 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) o→o l→ 

 (0.05) [ 

 ]     1e−42 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

o→o l→ 

 (0.05) [ 

 ]     4e−42 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.2) o→o l→ 

[ 

 ]     5e−44 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) o→o l→ 

 (0.05) [ 

 ]     3e−44 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) o→o l→ 

 (0.05) [ 

 ]     1e−42 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

o→o l→ 

 (0.05) [ 

 ]     4e−42 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.2) o→o l→ 

[ 

 ]     5e−44 {circumflex over ( )} 

{circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) o→o l→ 

 (0.05)8: ̂podzol$l

[ 

 $] 2.73535e−28 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.03) o→o l→ 

$→$ [ 

 $] 6.22996e−24 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

o→o l→ 

$→$ [ 

 $]     5e−32 {circumflex over ( )}  $ {circumflex over ( )}→{circumflexover ( )} p→ 

o→o d→ 

z→ 

 (0.05) o→o l→ 

$→$ [ 

 $] 1.36767e−39 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.03) o→o l→ 

 (0.05) $→$ [ 

 $] 3.11498e−35 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

o→o l→ 

 (0.05) $→$ [ 

 $]     2e−41 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.2) o→o l→ 

$→$ [ 

 $]   2.5e−43 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.05) o→o l→ 

 (0.05) $→$ [ 

 $]     6e−43 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) o→o l→ 

$→$ [ 

 $]     2e−41 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

o→o l→ 

$→$ [ 

 $]     1e−42 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) o→o l→ 

$→$ [ 

 $]     6e−43 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.03) o→o l→ 

$→$ [ 

 $]     2e−41 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

o→o l→ 

$→$ [ 

 $]     1e−52 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

z→ 

 (0.2) o→o l→ 

 (0.05) $→$ [ 

 $]     1e−42 {circumflex over ( )} 

 $ {circumflex over ( )}→{circumflex over ( )} p→ 

o→o d→ 

 (0.2) z→ 

 (0.05) o→o l→ 

$→$

Result:

The above examples indicate not only operability of the proposedtransliteration method, but also the potential for improving theaccuracy and clarity of transliteration in automatic systems, i.e.systems operating without human intervention.

Although the claimed methods are described above by means of exemplaryembodiments, these exemplary embodiments can be modified withoutdeparture from the spirit and scope of the invention, imposing nolimitations on the scope of claims.

1. A method for automatic transliteration of a record in a firstlanguage to a word in a second language, which involves the following:receiving a sequence of signals, each of which encodes the correspondingcharacter in the record in the first language to be transliterated, andsaving the said received signal sequence to memory; during stepwiseanalysis of the signals from the start to the end of the saved signalsequence, finding all transliteration rules for the character beinganalyzed or character group starting with the character being analyzed,in the record in the first language to be transliterated, by looking upin the appropriate rule base created in advance; complementing all thetransliteration variants obtained at the previous step with charactersof the second language from each transliteration rule found at the givenstep, thus obtaining at the given step composite variants oftransliteration associated with this step or with one of the subsequentsteps; at each step, comparing each of the aforementioned compositetransliteration variants obtained at the given step against startingsegments of words in the second language, by looking up in the createdbase of starting segments of words in the second language, and if aparticular transliteration variant associated with the given stepmatches a starting segment of a word in the second language, retainingthis transliteration variant for analysis in subsequent steps; oncompletion of the last step of the analysis, accepting at least onetransliteration variant in the second language associated with the laststep as the transliteration result of the said record in the firstlanguage; creating a sequence of signals, each of which encodes thecorresponding character in the resulting transliterated word of thesecond language.
 2. The method of claim 1, further comprisingcomplementation of the said received signal sequence at its start andend with signals corresponding to preset characters, and the first stepof analysis is started from the first of the said preset characters. 3.The method of claim 2, wherein the signals that complement the receivedsignal sequence correspond to preset alphabetic characters.
 4. Themethod of claim 2, wherein the signals that complement the receivedsignal sequence correspond to preset non-alphabetic characters.
 5. Themethod of claim 3, wherein the signals that complement the receivedsignal sequence correspond to different preset characters.
 6. The methodof claim 3, wherein the signals that complement the received signalsequence correspond to the same preset character.
 7. The method of claim1, wherein the characters used for the record in the first language areselected from the group comprising: alphabetic characters of the firstlanguage; non-alphabetic characters, each resembling some alphabeticcharacter of the second language; non-alphabetic characters that incombination resemble some alphabetic character of the second language.8. The method of claim 1, wherein the said base of starting segments ofwords in the second language is created in advance.
 9. The method ofclaim 1, wherein the said base of starting segments of words in thesecond language is created directly during the said comparison, fromchains no longer than “m” characters, which chains are obtained fromactual words of the second language, and the said starting segments ofwords in the second language are created by mapping the said chains witha shift by one character, wherein m>1 is an integer pre-selected for thesecond language; during this process, the occurrence frequency in thesecond language is determined for each of the said chains, and thechains with occurrence frequency below a preset limit are removed. 10.The method of claim 9, further comprising determination of theprobability of each particular transliteration variant associated withthe given step and no more than “m” characters long, wherein saidprobability is determined by finding the occurrence frequency of thecorresponding chain, while for each particular variant that is longerthan “m” characters and is associated with the given step, itsprobability is determined by multiplying the occurrence frequencies ofthose involved said chains “m” characters long which belong to the givenparticular transliteration variant with overlapping upon shifting eachsubsequent chain “m” characters long by one character from the beginningof the given particular transliteration variant; if any of the chains ismissing, the variant is discarded.
 11. The method of claim 10, wherein,during determination of the said probabilities of transliterationvariants, the probability calculated at a particular step of saidanalysis is multiplied by a pre-selected weighting factor of the rulewhich is used at the given step of analysis to find the transliterationvariant.
 12. The method of claim 10, wherein if no variant oftransliteration in the second language is found upon completing theanalysis for the record in the first language being analyzed, the saidstepwise analysis of signals is repeated from the beginning to the endof the said saved signal sequence, and if this particulartransliteration variant associated with the given step does not matchany starting segment of some word in the second language, thistransliteration variant is assigned a tentative occurrence frequencyequal to a small preset value.
 13. The method of claim 10, wherein thenumber of transliteration variants saved at each step of the analysisdoes not exceed a preset number, and wherein the said variants end inthe same “m−1” characters of the second language and have higherprobabilities.
 14. The method of claim 13, wherein in the case that thetransliteration variant comprising “k” said chains is compared to atransliteration variant comprising “q” said chains, where “k” and “q”are positive integers, the probabilities of both transliterationvariants are normalized by “k” and “q”, respectively, before thecomparison, by extracting the k-th or q-th root, respectively, from thecorresponding probability.
 15. The method of claim 14, wherein, besidethe number of chains, the number of rules used in each of the comparedvariants is taken into account by adding the number of rules usedtherein to “k” and “q”, respectively.
 16. The method of claim 1, whereinthe accepted transliteration result after the last step is the variantwith the highest probability.
 17. The method of claim 10, wherein thelogarithmic measure of probability is used as the said probability of atransliteration variant.
 18. The method of claim 7, wherein the acceptedtransliteration result after the last step is the variant with thehighest probability.
 19. The method of claim 7, wherein the logarithmicmeasure of probability is used as the said probability of atransliteration variant.
 20. The method of claim 7, further comprisingcomplementation of the said received signal sequence at its start andend with signals corresponding to preset characters, and the first stepof analysis is started from the first of the said preset characters.